Overview


Jefferson memorial

In this exercise, you will explore crime data from the District of Columbia (a.k.a., Washington, DC) from October 2019 to October 2021. Each crime record includes the geographic coordinates of the crime (recorded in EPSG 4326), the date and time it occurred, whether it was a violent or property crime, and the type of offense. Your job will be to generate maps of these crimes across the District.

This assignment will give you an opportunity to test your sf, purrr::map, and tmap skills!

Grading


The points allotted for each question are provided in highlighted red bold text (e.g., [1.0]) within the question itself. When applicable, total points for a question may represent the sum of individually graded components, which are provided in red text (e.g., [1.0]).

Points may be deducted from each question’s total:

Note: The maximum deduction is the total points value for a given question

Pay careful attention to the format of your code – not following the rules above will cost you lots of little points (and some big ones) that can really add up!

In addition to points allotted per question, you must ensure that your R Markdown document runs out-of-the-box [25% off of the total grade] – in other words, the document will knit without error. Some tips for doing so:

Click the blue button below to view the functions that you may use in completing this problem set. Make sure that you know what each function does (use ?[function name] if you do not). Do not use any functions outside of this list!

In this assignment, you may use only the following R functions (Note: If you are unclear on what a given function does, use ? to view the help file!):

  • base::()
  • base::=
  • base::<-
  • base::==
  • base::!
  • base::~
  • base::$
  • base::c
  • base::library
  • base::list
  • dplyr::left_join
  • dplyr::n
  • dplyr::select
  • dplyr::summarize
  • lubridate::year
  • magrittr::%>%
  • purrr::map
  • readr::read_csv
  • tidyr::pivot_wider
  • sf::st_as_sf
  • sf::st_crs
  • sf::st_join
  • sf::st_read
  • sf::st_transform
  • tibble::as_tibble
  • tmap::+
  • tmap::tmap_mode
  • tmap::tm_basemap
  • tmap::tm_dots
  • tmap::tm_polygons
  • tmap::tm_shape

Note: The packages dplyr, ggplot2, lubridate, magrittr, purrr, readr, tidyr, and tibble are all part of the tidyverse metapackage and are loaded with library(tidyverse).


Getting started


1. [0.5] Save and knit this document:

  • [0.1] Replace my name in the YAML header with yours
  • [0.1] Add the current date in the YAML header
  • [0.3] Save the .rmd file in the output folder of your project as (but replace my name with yours): problem_set_4_Evans_Brian.rmd

2. [0.5] Set up your session:

  • [0.25] Load the sf, tmap, and tidyverse libraries;
library(sf)
library(tmap)
library(tidyverse)
  • [0.25] Set the tmap mode to “plot”.
tmap_mode('plot')

Read and pre-process data


We will use the shapefile dataset dc_census.geojson as a template upon which we will display crimes in the District of Columbia. The coordinate reference system (CRS) of these data is EPSG 32618 (Universal Transverse Mercator, zone 18N). The field geoid is the primary key for each polygon in the dataset and is the only field that we will use for this problem set.

3. [0.75] As parsimoniously as possible, read in the census data (dc_census.geojson) [0.15], and:

  • [0.25] Remove all fields except GEOID;
  • [0.25] Change the field name GEOID from upper to lower case;
  • [0.1] Assign to your global environment with the name census.
census <-
st_read('data/raw/shapefiles/dc_census.geojson') %>% 
  select(geoid = GEOID)

Crime data, dc_crimes.csv, were obtained from Open Data DC. This is a tabular dataset and each row represents the record for an individual crime committed. Fields (columns) include:

Our goal in this problem set is to evaluate the number and spatial distribution of violent crimes that were committed in 2020.

4. [1.25] Read in the crimes dataset (dc_crimes.csv) [0.15], and:

  • [0.25] Subset crimes to where the offense_group is categorized as “violent” and the crime was committed in 2020;
  • [0.25] Remove the fields offense_group and date_time;
  • [0.5] Convert to an sf object with the same CRS as census;
  • [0.1] Assign to your global environment with the name crimes.
crimes <-
  read_csv('data/raw/dc_crimes.csv') %>%
  filter(offense_group == "violent",
         lubridate::year(date_time) == 2020) %>%
  select(!offense_group,!date_time) %>%
  st_as_sf(coords = c('longitude', 'latitude'),
           crs = 4326) %>%
  st_transform(st_crs(census))
## Rows: 57400 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): id, offense, offense_group
## dbl  (2): longitude, latitude
## dttm (1): date_time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Summarize crimes by census tract


Later this problem set (questions 8 and 9), you will plot number and spatial distribution of crimes by census tract and type of offense. To do so, we must tabulate the number of crimes for each tract.

5. [2.0] Create a shapefile that describes the number of crimes per census tract and offense:

  1. [0.5] Join the census data to the crimes dataset;
  2. [0.2] Convert the resultant object to a tibble;
  3. [0.5] Calculate the number of crimes per geoid and offense;
  4. [0.5] Reshape the resultant object such that the rows represent census tracts and column names are: geoid, robbery, assault w/dangerous weapon, sex abuse, and homicide;
  5. [0.2] Convert the resultant object to a shapefile;
  6. [0.1] Globally assign the resultant object as crimes_by_census_tract.
crimes_by_census_tract <-
  census %>%
  st_join(crimes,
          by = c("geoid" = "id")) %>%
  as_tibble() %>%
  summarize(n = n(),
            .by = c("geoid", "offense")) %>%
  filter(!is.na(offense)) %>% 
  pivot_wider(
    id_cols = geoid,
    names_from = offense,
    values_from = n,
    values_fill = 0) %>%
  left_join(census, .)
## Joining with `by = join_by(geoid)`

Prepare data for plotting


6. [0.5] Combine crimes_by_census_tract, crimes, and census into a single list object [0.2]. In doing so:

  • [0.2] Within the list, assign the names n_crimes to crimes_by_census_tract, crime_locations to crimes, and tracts to census;
  • [0.1] Globally assign the resultant object with the name shapes_utm.
shapes_utm <-
  list(
    "n_crimes" = crimes_by_census_tract,
    "crime_locations" = crimes,
    "tracts" = census)

7. [1.5] Using purrr::map() for iteration [1.0], convert the CRS of the shapefiles contained in shapes_utm to EPSG 4326 [0.4] and assign the list to your global environment with the name shapes_4326 [0.1].

shapes_4326 <-
  shapes_utm %>%
  map(
    ~ st_transform(.x, crs = 4326))

Generate maps of crime in DC


8. [1.5] Using shapes_utm, generate a static choropleth tmap of census tracts [0.5] where the fill color is determined by the number of robberies committed [1.0].

tm_shape(shapes_utm$n_crimes) +
tm_polygons(col = 'robbery')

9. [0.5] Set the tmap mode to interactive viewing:

tmap_mode("view")
## tmap mode set to interactive viewing

10. [1.0] Using shapes_4326, generate an interactive tmap where:

  • [0.25] The fill color of census tracts is determined by the number of robberies in a given tract;
  • [0.25] Homicides are displayed as clusters of points;
  • [0.25] OpenStreetMap and Esri.WorldImagery are provided as background layers;
  • [0.25] The layers are named “Robberies” and “Homicides”.
tm_basemap(
  c("OpenStreetMap",
    "Esri.WorldImagery")) +
tm_shape(shapes_utm$n_crimes,
         name = "Robberies") +
tm_polygons(col = 'robbery') +
tm_shape(shapes_utm$n_crimes,
         name = "Homicides") +
tm_dots('homicide',
        clustering = TRUE) 

Extra credit! [0.25] Modify Question 10 such that the polygons are semi-transparent (Note: I have not taught transparency yet, but you can find information on how to do so with ?tm_polygons).

tm_basemap(
  c("OpenStreetMap",
    "Esri.WorldImagery")) +
tm_shape(shapes_utm$n_crimes,
         name = "Robberies") +
tm_polygons(col = 'robbery',
            alpha = .5) +
tm_shape(shapes_utm$n_crimes,
         name = "Homicides") +
tm_dots('homicide',
         clustering = TRUE)